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The problem of packet scheduling for traffic streams with target outflow profiles traversing input queued switches 
is formulated in this paper Target outflow profiles specify the desirable inter-departure times of packets leaving the 
switch from each traffic stream. The goal of the switch scheduler is to dynamically select service configurations 
of the switch, so that actual outflow streams ("pulled" through the switch) adhere to their desired target profiles as 
accurately as possible. 

^ ■ Dynamic service controls (schedules) are developed to minimize deviation of actual outflow streams from 

their targets and suppress stream "distortion". Using appropriately selected subsets of service configurations of the 
switch, efficient schedules are designed, which deliver high performance at relatively low complexity. Some of these 
schedules are provably shown to achieve 100% pull-throughput. Moreover, simulations demonstrate that for even 
substantial contention of streams through the switch, due to stringent/intense target outflow profiles, the proposed 
schedules achieve closely their target profiles and suppress stream distortion. 
■ The switch model investigated here deviates from the classical switching paradigm. In the latter, the goal of 

I packet scheduling is primarily to "push" as much traffic load through the switch as possible, while controlling 

delay to traverse the switch and keeping congestion/backlogs from exploding. In the model presented here, however, 
the goal of packet scheduling is to "pull" traffic streams through the switch, maintaining desirable (target) outflow 
profiles. 

Index Terms 

' Packet switching. Real-time Scheduling, Quality of Service, Dynamic Programming, Lyapunov Techniques. 

I. Introduction 

Real-time services such as multimedia streaming, video on demand, video telephony etc. continue to gain 
popularity amongst Internet users. These applications have strict quality-of-service (QoS) requirements with regard 
to packet delivery times and jitter Scheduling algorithms employed in packet switches/routers play a key role in 
QoS provisioning for real-time Internet applications. 

While early research on packet switching focused on the output-queued (OQ) switch architecture [1], [2], input- 
queued (IQ) switches have received much attention in recent times, owing to their scalable architecture. However, 
non-trivial scheduling/aibitration algorithms are needed to resolve contention between input traffic streams to ensure 
efficient operation of an IQ switch. Most research on IQ switch scheduling has revolved around performance metrics 
like throughput and average delay, which are conceived on macro time-scales (at the mean flow level). Numerous 
scheduling algorithms based on maximum weight matching (MWM), projective cone schedules (PCS), etc. have 
been proposed in the literature [3]- [6], all of which provably guarantee 100% (push)-throughput, with varying 
degrees of average delay performance. This body of literature, while important in its own right, does not address 
the problem of QoS provisioning for time/jitter sensitive real-time traffic, which entails performance engineering 
and control of the switch on micro time-scales (at the packet level). 
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In an initial effort to address the latter problem, in this paper, we develop IQ switch scheduling algorithms for 
traffic streams associated with target outflow proflles. The target profile of a traffic stream specifies the desirable 
(hence, the term "target") packet inter-departure times (IDT) of packets leaving the switch. In other words, the 
target outflow profile determines the ideal packet inter-departure times. 

In the absence of congestion, packets from each stream will depart the switch in accordance with the associated 
target profile. However, contention between competing traffic streams for the shared switch fabric causes congestion 
in the switch. Consequently, the actual departure process of a stream deviates from the ideal departure process (as 
dictated by its target outflow profile). In other words, the stream outflow gets distorted by the switch, vis-a-vis 
its target profile. Thus, the objective of the switch service scheduler is to minimize the aggregate distortion of 
the target output profiles of all streams traversing the switch. That is, the scheduler must select switch service 
traces (sequences of switch configurations) such that the actual departure/outflow profiles of streams track their 
corresponding target profiles as accurately as possible. We call this the Service Trace Control (STC) problem for 
an IQ switch. 

The motivation behind seeking a solution to the STC problem is to render packet switched networks somewhat 
"transparent" to timing/jitter sensitive multimedia traffic. The target outflow profiles are determined by the times at 
which consecutive packets need to be delivered to end users to ensure uninterrupted multimedia playout (the playout 
profile). High quality multimedia experience is provided to end-users if traffic streams negotiate routers/switches 
with minimal distortion. Note that the term "distortion" is simply used in this paper in connection to deviation of 
packet inter-departure times from their target profiles. The term is not used as in information theory and coding 
theory, where it has a deeper meaning, (deviation from target profiles). 

In our switch model, delayed packets are not dropped, but instead are penalized for violating their target packet 
inter-departure times (IDT). The switch is also penalized for being ahead of the target packet IDTs. This is done 
to prevent buffer overflows at downstream nodes (flow control) and the end-user, as well to avoid starvation of 
best-effort traffic (i.e. without target outflow profiles) being served by the switch. This model is representative 
of half-duplex applications like lossless multimedia streaming (e.g. an online baseball game), where the end-user 
would much rather wait for a delayed packet than miss viewing the media content encoded in the delayed packet 
(which would happen if the switch drops delayed packets). 

In our framework, packets can be thought of as being associated with soft deadlines for their inter-departure 
times (IDT). Any positive deviation (exceeding the deadline) from the target IDTs manifests itself as a soft deadline 
violation, which carries a penalty/cost. The "softness" of a deadline is reflected by the cost associated with its 
violation (the lower the violation cost, the softer the deadline). On the other hand, any negative deviation from the 
target IDTs (transmitting before a target inter-departure time) is also a soft deadline violation, and carries a cost 
(e.g. for stressing downstream receivers with potential buffer overflows). The service trace control (STC) problem 
thus translates to minimization of aggregate soft deadline violation cost over all traffic streams. This is explained 
in detail in Section |lll 

In the classical packet switching paradigm (see [3]- [6])) incoming traffic flows compete for switch service. The 
scheduler's objective is to control the congestion buildup (and avoid excessive backlogs), given the traffic load. 
Alternatively, the scheduler tries to maximize the inflow load that can be "pushed" through the switch, without the 
packet backlogs exploding. Hence, it tries to maximize the "push-throughput". In the switch model studied in this 
paper, the issue is very different. Packets streams are "pulled" through and out of the switch. The packets initially 
reside in input queues, organized as virtual output queues (VOQ). Recall that the scheduler's objective is now to 
pull the streams through and out of the switch, so that their outflow packet inter-departure times (IDT) deviate as 
little as possible from specified targets and the outflow stream distortion is minimized. But if the target IDTs are too 
short (outflow target profiles have high intensity) the switch may not be able to keep up and the distortion of one 
or more streams may grow excessively over time. Thus, the scheduler can now be viewed as trying to maximize 
the "pull-throughput" of the switch, i.e., supply the most intense outflow streams, while keeping their distortions 
under control. This is explained in detail in Sections |ll] and IV] 

A. Related work 

The case of scheduling periodic messages through IQ switches has been addressed in the literature. In that case, 
packets for each traffic stream are generated periodically, and the maximum time allowed for transmission of a 
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packet is equal to the period of the stream. A schedule is deemed feasible if all messages meet their deadhne 

requirements. Note that the periodic model is a special case of our general model, with constant inter-departure 
times (equal to the period of the stream). Inukai [7] showed that a feasible schedule can be constructed when the 
periods of all streams are equal and both input and output link utilization are less than 1. Liu et al. [8] conjectured 
that Inukai's conclusion holds for traffic streams with arbitrary periods and also proposed heuristic scheduling 
algorithms based on the earliest deadline first (EDF) and minimum laxity first (MLF) policies. The performance of 
their heuristics degrades rapidly with switch size. In support of the conjecture, Giles et al. [9] proposed the nested 
periodic scheduhng (NPS) rule, which finds a feasible schedule when each period divides all longer periods and 
link utilization is less than 1. NPS also finds a feasible schedule for arbitrary message periods, provided the link 
utilization is no more than 1/4. The computational complexity of NPS is 0{N^) for an x switch. Rai et 
al. [10] developed heuristic weighted round robin (WRR) scheduling policies for multiclass periodic traffic, with 
an onUne implementation complexity of 0{N^). More recently, Lee et al. [11] proposed the Flowbased Iterative 
Packet Scheduling (FIPS) algorithm for periodic traffic with two classes, which minimizes the number of dropped 
packets when the switch is overloaded. They extended the FIPS algorithm to design efficient heuristics for arbitrary 
multiclass traffic. Their proposed algorithms outperform MLF and EDF based policies, but have a complexity of 
0(7V4-5). 

On a different strand of research, Li et al. [12] developed a frame-based scheduler with guaranteed delay and 
jitter bounds for leaky-bucket constrained traffic. Chang et al. proposed schemes for providing delay guarantees in 
IQ switches based on the Birkhoff-von Neumann (BV) decomposition of the input rate matrix in [13] and based 
on EDF for load balanced switches (see [14]) in [15]. Their schemes have an offline computational complexity of 
0(iV4.5-j ^jjj online memory requirement of 0{N^ log N). Keslassy et al. [16] proposed a frame based scheduler 
based on the BV decomposition to guarantee low jitter, under the assumption that jitter sensitive traffic forms a 
small fraction of the overall switch load. 

A common feature of the above works is that they deal with scheduhng of smooth/regular traffic (completely 
characterized by a single fixed rate known to the scheduler). However, traffic arriving to a switch can be irregular 
due to the bursty nature of traffic sources (e.g. variable bit rate video), due to flow aggregation, or due to jitter 
induced by upstream switches. Further, rates of different streams are not always known to the scheduler. Also, these 
schemes have significant computational complexity, making them relatively difficult to implement in high speed 
switches. 

For completeness, we also mention two other somewhat relevant bodies of work, akin in spirit to our modehng 
approach. Our "soft deadline" point of view discussed before is reminiscent of the time/utility function (TUF) 
approach introduced by Jensen et. al. [17] to study scheduling in real-time operating systems. Moreover, our notion 
of target profiles for different traffic streams is reminiscent of the rich set of network calculus tools developed 
by Cruz ( [18] and several subsequent works with others) to study the problem of providing deterministic QoS 
guarantees in time-slotted virtual circuit networks, based on the notion of service curves. 

B. Contributions 

The key contributions of our work are two-fold. Firstly, we develop a novel outflow aware switching framework, 
based on the idea of shaping the switch outflow streams to match desired/target profiles. While we exclusively 
study this model in the context of an IQ switch, the core ideas are more widely applicable to any queuing system 
where competing users/jobs are associated with inter-departure time (IDT) constraints. 

Secondly, we develop relatively low complexity scheduling policies for IQ switches, using the idea of switch 
configuration subset based schedules. The idea is to partition the huge set of possible switch service configurations 
(of size N\) into smaller subsets of size each, and schedule the switch using only one subset in every time-slot. 
The resulting pohcies achieve relatively low complexity. The results presented here provide a substantial extension 
of the research thread initiated in [20], [21], where some early observations regarding the studied switch model 
were made. 

In contrast to the previously cited works, in our switch model we do not make any assumptions on the rate, 
periodicity etc. of traffic streams traversing the switch. We also develop a family of scheduling policies achieving 
lower complexity of 0{N'^) per time-slot, which could be manageable from an implementation point of view in 
certain practical situations. 
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C. Organization of the paper 

The remainder of this paper is organized as follows: In Section |lll we first formulate the service trace control 
(STC) problem for minimizing stream distortion with respect to their target profiles as a finite-horizon dynamic 
program [22]. We then establish the optimality of a greedy policy for a 2 x 2 switch and explore its feasibility as 
a heuristic policy for bigger switches. Subsequently, we introduce the notion of switch configuration subset based 
STC in Section JII] In Section |lVl we develop the notion of meta-queues, which yields an alternative view of subset 
based STC and also provides a general framework for designing different families of STC policies. In Section |Vl 
we define the admissible region of the switch and show (using Lyapunov techniques) that subset based STCs, 
with appropriate subset selection rules, guarantee finite deviation from targets for all traffic streams, under any 
admissible load. Experimental evaluation of various proposed scheduling/STC policies in Section IVll demonstrates 
high-performance under various stress regimes. The paper concludes in Section IVIII 

D. Notations and conventions 

Notations and conventions employed throughout the paper are summarized here for convenience. All vectors and 
sequences are denoted in boldface. For a vector x, the v}^ element is denoted by Xn, and for a vector Xj, the v}^ 
element is denoted by Xi^n- N denotes the set of natural numbers, Z denotes the set of integers, and denotes 
the set of non-negative integers. denotes the all zeros vector and 1 denotes the all ones vector, denotes the i*^ 
unit vector in M^, i.e., a vector with a 1 in the i^^ location and O's elsewhere. Further, eo = 0. The inner product 
between two vectors x and y is denoted (x, y). Finally, the "big-oh" notation f{N) = 0{g{N)) is used to indicate 
that 3 c > such that f[N) < cg{N) for large enough N. 

II. Minimizing Stream Distortion 

A. Switching model 

Consider an input queued (IQ) switch with virtual output queues (VOQs) at all input ports to prevent head-of- 
line (HOL) blocking. There are iV^ yOQs in an X switch with A^ input and A^ output ports, as shown in 
Fig. [T] Both input and output ports are indexed 1 , . . . , A^. The i*^ VOQ stores packets destined from input port 
[(i — 1)/A^J + 1 to output port {i — 1) mod A^ + 1 and is denoted Qj. The switch operates in slotted time. Every 
input (output) port can be connected to at most one output (input) port in a time-slot. An N x N switch can be set 
into A"! possible configurations. Each configuration is associated with a unique configuration vector of length A^^. 
Let Vj = (vj 1 Vi^2 ■ ■ ■ Vi,N^) £ V denote the i*^ configuration vector, where V is the set of all possible configuration 
vectors. Then, Vij = 1 if Qj is served when the switch is set in configuration Vj and Vij = otherwise. We use 
the terms configuration and configuration vector interchangeably throughout the paper. 

Example 1: Two possible configuration vectors for a 2 x 2 switch are vi = (1 1) and V2 = (0 1 1 0). If 
a 2 X 2 switch is configured with configuration vector vi, the first (second) input port is connected to the first 
(second) output port. If the switch is configured with vector V2, the first (second) input port is connected to the 
second (first) output port. 

In each time-slot, a single cell can be transferred from an input port to an output port, if those ports are connected 
in the selected switch configuration. This cell/packet resides in the VOQ associated with the input-output port pair. 
We use the terms packet and cell interchangeably. Indeed, a cell is a packet of size 1. The underlying assumption 
is that a packet of size K cells can be "broken" into K cells for individual processing, and reassembled at the 
output of the switch. 

We assume there is a large (theoretically infinite) supply of cells/packets residing at the VOQs initially, so that 
VOQs never run out of packets. For example, one may consider a switch in a video server farm, where video 
content is retrieved from hard disks and streamed via the switch to remote users. The switch VOQs are directly fed 
with video packets from the server disk and never (rarely) empty until the streamed content transmission completes. 
Analogous scenarios emerge in storage area network switches, where large files are retrieved from hard disks and 
streamed via switches to users. 

Every VOQ is associated with a traffic stream, characterized by a Target Stream Profile (TSP). The traffic 
stream's cells/packets are stored in the associated VOQ. The TSP is the desirable profile of outfiow traffic, i.e., of 
the stream leaving the switch. It basically specifies the time-slots in which cells of the stream should ideally depart 
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the switch. Alternatively, it characterizes the ideal time distance (number of slots) for releasing two consecutive 
cells from the stream's VOQ and getting them through and out of the switch. 

Technically, the TSP is a sequence of "0"s and "l"s which specifies the packet inter-departure time (IDT) 
targets/constraints between packets in the stream. Let 

s = {s\s\...) (1) 

denote the TSP for a typical traffic stream. Suppose that the k^^ "1" in s occurs at location r G N and the {k + 1)** 
"1" occurs at location r + 5^, for some 5^ G N. The interpretation is that the fc*^ packet in the stream should ideally 
depart the switch in the r*'* time-slot, the {k + l)"** packet in the stream should depart the switch in the (r + 1)^* 
time-slot, and therefore the desired inter-departure time (IDT) target between the /c*^ and (fc + 1)** packets of the 
stream is {t + 5k) — t = 5k time-slots. From the TSP we derive the cumulative Target Stream Profile (cTSP), 
denoted S = (S^, 5^, . . .), 5* G Z+ V i, where 

t 

S'^J2s^,t= 1,2,... (2) 

T=l 

is the number of packets of the stream which should ideally have departed the switch by the end of the t*^ time-slot. 

Example 2: We illustrate the concepts of TSP and cTSP through an example. Let the TSP of a stream be given 
by s = (0, 1, 0, 0, 1, 0, 1, . . .). This impUes that the 1** packet of the stream should ideally depart the switch in the 
2nd tijjig.siot, the 2"^^ packet should ideally depart in the S*'* time-slot, the 3'''^ packet should ideally depart in the 
T*'* time-slot, and so on. The entries of the cTSP are computed (by definition) as S"^ = s^, S'^ = + s^, . . .. Thus, 
we have S = (0, 1, 1, 1, 2, 2, 3, . . .). The interpretation is that no packets from this stream should have departed the 
switch by the end of the 1** time-slot, exactly one packet should have departed by the end of the 2"*^ time-slot, 
etc. 

Example 3: A special example is that of periodic traffic (of period 5) with fixed inter-departure times, i.e., 
6k = 5 y k. In this case, s* = 1 if t mod 5 = 0, and s* = otherwise. Further, 5* = [t/5\. In general, the IDT 
targets may not be constant but vary substantially, for instance, because of coding dependencies of cell/packets in 
video streams, etc. 

To characterize the service provided by the switch, we associate with every stream a Received Service Trace 
(RST), also a sequence of "0"s and "l"s. This is the actual (not desired) service sequence received by the stream. 
Let 

r = (r^r^...) (3) 
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denote the RST associated with a typical stream. Then, r"^ = 1 if the switch serves a packet from the stream in 
the r*'^ time-slot, and r"^ = otherwise. Similar to the cTSP, we derive the Cumulative Received Service Trace 
(cRST), denoted R = . . .), i?* € Z+ V t, where 

t 

= Y^r\t = 1,2,... (4) 

r=l 

is the number of packets of the stream which have actually departed the switch by the end of t*^ time-slot. 

Example 4: We illustrate the concepts of RST and cRST through an example. Consider a traffic stream with its 
TSP as given by Example |2] Now, suppose the RST for this stream is given by r = (0, 1, 0, 1, 0, 0, 0, 1, . . .). The 
interpretation is that the I'** packet of the stream departed the switch in the 2"*^ time-slot, the 2^'^ packet departed 
in the 4*^* time-slot, the 3'"'^ packet departed in the S*'^ time-slot, and so on. By definition, the cRST is constructed 
as = ^B? = + r^, . . ., yielding R = (0, 1, 1, 2, 2, 2, 2, 3, . . .). The interpretation is that no packets from the 
stream were released from the switch by the end of the 1*** time-slot, one packet was released by the end of the 
2"'^ time-slot, etc. 

Ideally, for every stream we desire i?* = 5* V t, which implies that every stream traverses the switch without 
experiencing any "distortion" of its target profile. However, this goal is not always realizable due to congestion 
caused by contention between competing streams for the shared switch fabric. If for a particular stream i?* > S"*, 
the stream has received more service than it requires to satisfy its packet inter-departure time (IDT) constraints and 
is said to be leading at time t. If R^ < S^, the stream has received less than its desired amount of service and is 
said to be lagging at time t. To quantify distortion of target profiles due to congestion, we track for every traffic 
stream its deviation, denoted d = (d^, d^, . . .), G Z V t, where 

^R^ -S\ t = 1,2,... (5) 

which quantifies the excess or deficiency in service catered to the stream by the switch as a function of time. 
A negative deviation (lag) indicates missed deadlines and is undesirable from a QoS provisioning perspective. A 
positive deviation (lead) is undesirable because it can cause buffer overflows at downstream switches and the end 
user and lead to starvation of delay tolerant flows traversing the switch. 

Example 5: We illustrate the notion of deviation through an example. Consider a traffic stream with its TSP 
as given by Example [2] and RST as given by Example ID Recall that for this stream, the cTSP is given by 
S = (0, 1, 1, 1, 2, 2, 3, ... ) and the cRST is given by R = (0, 1, 1, 2, 2, 2, 2, 3, . . .). Taking an elementwise difference, 
the deviation is given by d = R — S = (0, 0, 0, 1, 0, 0, —1, . . .). Note that the 1** packet of the stream gets served 
by the switch on time, the 2""^ packet gets served one time-slot in advance, and the 3^"^ packet gets served one 
time-slot later than desired. The stream is therefore "leading" for one time-slot immediately after the departure of 
the 2"^^ packet, and is lagging in the time-slot, which is the desired/target departure time of the '^'^ packet in 
the stream. 
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The ideas introduced in this section are depicted in Fig. |2l While the cTSP and cRST curves are shown to be 
"smooth" in the figure for illustration, note that for the discrete-time model studied in this paper (at most one cell 
per VOQ processed by the switch in a time-slot), the curves will look like staircase functions, with the step size 
equal to 1. 

B. Finite horizon dynamic programming (DP) formulation 

Consider a finite horizon of T > time-slots indexed by t G {1, • • • ,T}. Let Sj = {s\,. . . ,sj) and = 
{S},..., Sj) denote the the first T entries of the TSP and cTSP of VOQ Qi, respectively. Define 

x*^(4,...,40, X*^(5*,...,5^0- (6) 

Thus, X* (X*) is a vector of the t*^ entries of the TSP (cTSP) of all the iV^ streams. We shift from the Sj {Si) 
notation to the x* (X*) one in order to change the point of view from being focused on each individual queue/stream 
i to tracking (all queues/streams at) each time-slot t < T. To clarify the notation further, consider a T x A^^ matrix 
with x^,x^, . . . ,x* as its rows. The i^^ column of this matrix comprises of the TSP entries for Qi over the time 
horizon of interest, viz. {1, . . . , T}, In matrix terminology, the i*'^ column is the transpose of the TSP of Qj. On 
the other hand, the t*^ row comprises of the t^^ entries of all A'^^ traffic streams traversing the switch. Next, define 

d*^(4,...,d*v0 (7) 

as the state of the switch in the t^^ time-slot, where d\ is the deviation (as defined in ^) of the stream associated 
with VOQ Qi in the t*'* time-slot. 

Since deviations from target profiles are undesirable, they are associated with a "cost". In particular, to the i*^ 
stream we assign the cost function (/)i{k), which reflects the cost associated with a deviation k £ Z. We assume 
the following: 

1) (/)i(0) = (zero deviation is desirable) 

2) 4'i{k) is non-negative and increasing for both k > and A; < (since both leads and lags are undesirable) 

3) (piik) is convex (the cost associated with deviation increases at a positive rate as the deviation increases in 
magnitude) 

A sample cost function which satisfies the above properties is depicted on the right side of Fig. |2] An example of 
a cost function which we will often use in this paper is the quadratic cost function (/)i{k) = k"^. Finally, let 

$(d*)^ (8) 

i=l 

denote the sum of the deviation costs of all VOQs. 

Remark 1: It is important to note that unlike the packet inter-departure time constraints, the cost functions are 
not an inherent part of the problem, but are instead extraneously assigned by the switch controller for the purpose 
of service trace control. Thus, the switch controller has the freedom to tune these cost functions in order to optimize 
switch performance. 

Remark 2: In our modeling framework, packets can be thought of as being associated with soft deadlines. For 
the more conventional case of strict deadlines, the "value" of a packet is constant prior to its deadline and zero 
thereafter. As a result, a packet is dropped if it has not departed the queue before its due date. In our context, where 
a typical motivating application is multimedia streaming, lossless delivery of packets is sought. The "value" of a 
packet reaches its peak at its target delivery time (as dictated by the TSP). The packet is treated as less valuable 
(but not dropped) if received either before or after its target time. In this sense, the "softness" of the deadline 
constraints for a traffic stream is quantified by the steepness of the associated cost function. 

In every time-slot, the Service Trace Controller (STC) drives the evolution of service traces for various traffic 
streams by setting the switch in one of A^! possible configurations (chosen from the set V) or idling the switch. 

Definition 1: A policy Ut = {v* G V U {0}, t = I, . . . ,T} is defined as a sequence of switch configurations 
selected by the service trace controller in time-slots t = 1, . . . ,T. 
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Given the initial state d'', we are interested in computing the optimal policy (one which minimizes the total cost 
over a finite horizon) H^- which satisfies 




n^ = argmin<( > .^(dhJ ^ (9) 

Ut 

where d^^ denotes the state of the switch at the beginning of the t^^ time-slot under policy 11^. We will adopt 
the methodology of dynamic programming (DP) to compute 11^. 

Suppose Ut chooses configuration vector v* = v = (ui, U2, . . . , fAra) in the t*'^ time-slot. The deviation of the 
VOQ Qi increases by 1 at the end of the t^^ time-slot if it is served by configuration v, i.e., Vi = 1. Also, the 
deviation of the Qi decreases by 1 if its TSP has a non-zero entry in the t^^ location, i.e., = 1. Note that 
(the i^^ component of x*) is simply s-, from More compactly, the new deviation vector at the beginning of the 
{t + ly^ time-slot is given by 

d*+,^ = d*,^+v*-x*. (10) 

Let F*(d) be the cost incurred by 11^ over time-slots t, . . . ,T, starting in state d at the beginning of the t^^ 
time-slot. In dynamic programming terminology, is referred to the as the cost-to-go function, and is recursively 
computed from the following DP equations for t = 1, . . . , T 

V\d)= min J y*+i(d + v-x*) +$(d + v-x*)L (11) 
vevu{o} V ^ ' ^ ■ 

Cost-to-go in the next time-slot Instantaneous cost 

and the boundary conditions V'^^^{d) = V d. We will henceforth refer to d — x* as the deviation vector. 

C. Myopic/Greedy service trace control 

Observe from (fTTI ) that the optimal decision in state d in the t^^ time-slot is determined by the cost-to-go in the 
{t + ly^ time-slot, as well as the instantaneous cost. Now consider a myopic policy, which is "greedy" with respect 
to the instantaneous cost, i.e., ignores the cost-to-go in the next time-slot while making its current scheduling 
decision. In particular, the myopic policy chooses configuration v* in the t^^ time-slot in state d such that 

V* = argmin {<I)(d + V - x*)} . (12) 
vevu{0} 

In general, the myopic policy need not be optimal. However, for the scheduling problem at hand, the myopic policy 
is provably optimal for the case N = 2. For = 3, 4, numerical analysis reveals that the myopic policy is close to 
optimal. The cost of computing the optimal policy becomes prohibitive as N gets bigger (A^! + 1 possible decisions 
need to be evaluated in every possible state of the switch over a period of T time-slots). 

Theorem 1: The optimal finite horizon policy 11^ for a 2 x 2 switch (A^ = 2) is myopic. 

Proof: See Appendix IVIII-AI ■ 

Example 6: For concreteness and as a key example, let us assign quadratic cost functions to all traffic streams, 
i.e., <t)i{k) = k"^ \/ i. For any v G V U {0} we have 

$(d + V - X*) = (d - X*, d - X*) + (v, v) + 2(d - x*, v) . (13) 

V ' ^ V ' 

Policy independent Policy dependent : ?(v) 

The myopic policy in this case reduces to 

V* ; 2(d-x*,v*)+iV<0 

; else, ^ ^ 

where v* = arg min | (d — x*, v) } . 
vev 

The idea is as follows: We want to find a v € V which minimizes the policy dependent part. The set V is the 
set of all switch configuration plus the zero configuration (switch idle). For v = 0, the policy dependent part 
is 0. For all non-zero configurations, (v, v) = N, and the policy dependent part is 2(d — x*, v) + N. This term is 
minimized by v* (by definition). Thus, we pick the v which minimizes the min of and 2(d — x*, v*). 
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D. Partial configurations 

We begin this section with a definition. 

Definition 2: For an x IQ switch, a switch configuration v € V is called complete if (v, 1) = and is 
called partial if (v, 1) < N. 

In other words, in a complete configuration, every input port is connected to an output port, while in a partial 
configuration, some of the input ports may be idle. 

So far we have assumed that the service trace controller (STC) either selects a complete configuration {every input 
port is connected to an output port) or idles the switch. However, operating the switch using complete configurations 
only is not sufficient to exercise individual control on service traces of different streams, as illustrated by the next 
example. 

Example 7: Consider a 2 x 2 switch where the streams for Qi and Q4 are periodic with periods 2 and 4 
respectively, and Q2 and Q3 are empty (no new arrivals). In our notation, this translates to = (0, 0,0, 0),x^ = 
(l,0,0,0),x3 = (0,0,0,0),x^ = (1,0,0,1) and x* = x* V t. The myopic policy (which is optimal) given 

by ([T4l) either selects vi = (1 1) or in every time-slot. The configuration vector V2 is never selected because 
both queues serviced by V2 are empty. It is easily verified that either the lag of Qi or the lead of Q4 grow without 
bound under the optimal policy. 

To exercise individual control over service traces, we allow the STC to use partial configurations. Suppose 
complete configuration v G V serves VOQs indexed by the set X = {ii, . . . ,1^]. Any partial configuration v 
extracted from v is characterized by a vector ^ = (■^i, • • • , ^n), where = 1 if v serves Qi^ and = if v idles 
Qi- . Thus, 2^ partial configurations can be extracted from any complete configuration. 

Example 8: Consider configuration vi = (1 1) for a 2 x 2 switch. This configuration serves VOQs Qi and 
Q4. The partial configuration set {(1 0), (0 1),0, vi} can be extracted from the complete configuration 
vi. The first partial configuration in the set corresponds to ^ = (1,0), the second partial configuration corresponds 
to ^ = (0, 1), etc. Note that the configurations and v are always part of the configuration set associated with 
complete configuration v. 

E. The Maximum Sum of Lags (MSL) policy 

Let us revisit the myopic service trace control policy for the case of quadratic cost functions (Example O, 
allowing for partial configurations this time. Recall from ([T3] ) that the policy dependent part in <I> is (j(v) = 
(v,v) + 2(d — x*,v). If V serves VOQs indexed by set X, ?(v) can be rewritten as ^^t^j + 2^^fj((ij — x^j). 

jei jei 

Since v is a complete configuration, Vj = 1 \/ j ^ X. Now, split X into two disjoint subsets, X_(_ and X_, where 
X+ = {j e X : dj - X* > 0} and X_ = {j € X : dj - x] < 0}. Note that X+ U X_ = X and X+ n X_ = 0. 
Clearly, <;^(v) can be strictly decreased by setting = V j € X_|_. Doing so is equivalent to extracting a partial 
configuration from v by idling all VOQs with non-negative deviation. We therefore get the following two-step 
service trace control policy, which we refer to as the Maximum Sum of Lags (MSL) policy (see Table J]). 

1) Select V* = argmin I (d — x*,v)}. 

vev _ 

2) Extract a partial configuration from v* by idling all VOQs with non-negative deviation. 

The name of the policy arises from the fact that it selects the switch configuration whose associated VOQs have 
the largest sum "lag" (as defined in Section III-AI ). 

The computational complexity of MSL is 0{N'^) per time-slot, since Step 1 involves a maximum weight matching 
(MWM) computation on a bipartite graph [19]. Note that the edge weights used to compute this matching are in fact 
the deviations associated with the VOQs. Switching algorithms which use VOQ backlogs as the edge weights for 
computing MWM have been studied extensively in the literature, in the context of throughput maximizing switches 
(e.g. [3]). While 0{N^) complexity is a significant improvement over the optimal policy, algorithms to compute 
the maximum weight matching are cumbersome to implement and impractical for large switches. This motivates 
us to explore service trace control policies which yield MSL-like performance at manageable complexity. 

Remark 3: Step 2 of MSL can be generalized to construct a broader class of policies, namely MSL(^), indexed 
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by £ e N U {0}. Under MSLC^fl a VOQ served by the chosen complete configuration is idled only when its 
deviation is £ or more. By this token, MSL = MSL(O). 

III. Subset Based Service Trace Control 

To address the issue of high computational complexity associated with optimal service trace control, we propose 
a subset based control approach in this section. The key idea is to partition the configuration set V of size A^! into 
smaller disjoint subsets of size N each and operate the switch using configurations from only one of these subsets 
in any time-slot. 

A. Subset construction 

It follows by design that all configuration vectors for an IQ switch are of the form v = [6,^(1) ^,^(2) • • • ^^^(Ar)], 
where vr is a permutation of {1,2,..., N} and ej is as defined in Section II-DI Now, we define the circular shift 
operator. 

Definition 3: The circular shift operator C -.V >-^V is given by 

C(v) = [e^(N) e^(i) e^(2) • • • e^(Af-i)], (15) 

where v = [e^^^^ ^7r(2) • • • ^n{N)] G V is a switch configuration vector. 

Recursively define C^(v) = C(C'^~^(v)), A; € N, which corresponds to applying the circular shift operator k 
times to v. By convention, C°(v) = v. Also, note that C'^(v) = C'^ ^"'^ ^(v)- Thus, starting with any configuration 
vector V G V, we can generate a set of distinct configuration vectors by applying the operator C to v iV — 1 
times. We say that v generates the configuration subset 

5v = {v,C(v),...,C^-i(v)} c V (16) 

and refer to v as the generator vector. Following the outlined procedure, we can partition V into {N — 1)! disjoint 
configuration subsets of size N each. As an example, the configuration subsets for a 3 x 3 switch are depicted in 
Fig. [3 

For any v G V, (v,C'^(v)) = V G N. Physically, this implies that no VOQ is served by more than one 
configuration in a subset. Geometrically, this mean that all configuration vectors within a subset are "orthogonal" to 
each other. We therefore say that the generated subsets are orthogonal. Also, note that every VOQ is served by some 
configuration within a subset. Consequently, we say that every subset is complete. Combining the orthogonality 
and completeness properties we see that every VOQ is associated with exactly one configuration vector in every 
subset, implying 

iV-l 

^C-'(v) = l VvgV. (17) 

3=0 

B. The MSL-SS policy 

Let 5v = {C*(v)}^Q^ be the configuration subset generated by v. Now consider operating the switch such that 
the service trace controller is allowed to choose configurations from 5v alone, rather than from V. In particular, 
consider a restriction of the MSL policy of Section III-EI to the configuration subset S^. We get the following 
two-step policy, which we call the Maximum Sum of Lags - Single Subset (MSL-SS) policy: 

1) Select configuration C**(v) G such that 

i* = argmin {(d - x*,r(v))} . (18) 
i=0,...,Af-l 

2) Extract a partial configuration from C** (v) by idling all VOQs with non-negative deviation. 

'Note that it may not be feasible to realize MSL(^) for arbitrary ^ > 0, if the switch cannot provide a lead of £ even in the absence of 
congestion, due to unavailability of packets ahead of their departure times. However, MSL(^) is pertinent in a scenario where the switch 
resides at the egress of a multimedia server, where all traffic streams are pre-cached at the input of the switch. In this case, the switch can 
furnish a lead of up to £ to provide a "cushion" against possible congestion in the downstream network. 



11 







Vl 



C(vi) 



V2 



C(V2) 



C'(V2) 



Fig. 3. Two configuration subsets (of size 3 each) for a 3 x 3 switch. The three leftmost configurations are generated by vi — [ei e2 ea] 
and the three rightmost configurations are generated by V2 = [ei 63 62]. 



The per time-slot computational complexity for MSL-SS is 0{N'^), in contrast to 0{N^) for MSL. 

Remark 4: To compute the optimal decision for MSL-SS, N inner products of the form (d — x*,C*(v)) need 
to be computed, followed by a min of the resulting numbers. Each of these inner products involve vectors of 
length A^^. However, note that one of the vectors involved in each inner product is a configuration vector, which is 
relatively sparse (only out of the N'^ entries are non-zero). Further, all the non-zero entries are equal to 1. Each 
inner product, (d — x*,C*(v)), is therefore simply a sum of numbers. The MSL-SS policy is thus straightforward 
to implement, compared to algorithms used for computing maximum weight matching (needed for MSL). 
Two crucial questions arise at this point: 

1) What is the performance loss (if any) incurred by operating the switch using only one configuration subset? 

2) Can we compensate for the loss (if needed), without sacrificing the advantage of low complexity? 
We will address these questions in the remainder of the paper. 

IV. Meta-Queue Based Service Trace Control 

In this section, we study subset based service trace control in a broader framework, based on the notion of 
meta-queues. We will recover the MSL-SS policy proposed in Section IIII-BI as a special case of the meta-queue 
framework. 

A. Meta-queue construction 

Setting the switch in the complete configuration given by v = [6,^(1) . . . ^^(n)] is equivalent to serving VOQs 
indexed by the set I = {{i — 1)N + vr(i), i = 1, . . . , N}. Thus, every complete configuration serves A^ VOQs 
concurrently, which we "group" together to form a meta-queue. 

Let us focus on a single subset, say Sv Since is orthogonal and complete by construction, each configuration 
in 5v can be associated with a unique meta-queue, constructed by "grouping" A^ distinct VOQs. Note that all A^^ 
VOQs are assigned to some meta-queue, each one exactly once. The head of line (HOL) meta-packet of a meta- 
queue is constructed by grouping the HOL packets of its A^ constituent VOQs. With this construction, choosing a 
switch configuration is equivalent to serving the HOL meta-packet of the corresponding meta-queue. 

While grouping concurrently served VOQs to form a meta-queue seems quite natural, the relation between the 
deviation of a meta-queue and the deviations of its constituent VOQs is not immediately evident. In fact, we have 
the freedom to choose a mapping T : 1-^ Z, which relates the deviation of a meta-queue to the deviations of its 
A^ constituent VOQs. Given a mapping V, the problem of subset based control of an IQ switch turns into a problem 
of scheduling A^ parallel meta-queues on a single server. The latter is an important and interesting scheduling 
problem in its own right (e.g. see [23]). 

We now briefly digress from the service trace control problem for the IQ switch to study the single server 
scheduling problem mentioned above. Subsequently, we will show that by appropriately choosing T, one can 
construct good, low complexity service trace control policies for an IQ switch. 
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B. The single server scheduling problem 

The formulation is similar in spirit to the formulation for an IQ switch (Section III-BI ). and so is the notation. 
Consider a system comprised of + 1 parallel meta-queues and a single server. The i^^ meta-queue is denoted 
M.I, i = 0, . . . , A^. In every time-slot the scheduler serves the HOL meta-packet of one of the meta-queues, chosen 
according to some scheduling policy. While M.i, . . . ,M.n are "physical" meta-queues, Mo is a "dummy" meta- 
queue, scheduling which is tantamount to idling the server. Each meta-queue is associated with a traffic stream 
characterized by a target service profile (TSP). The interpretation of the TSP in this context is identical to Section 
III-AI i.e., it specifies the time-slots in which meta-packets from a meta-queue should ideally depart the server. The 
TSP associated with Mq has all zero entries. We denote by d* = (d^, . . . , d^) the deviation vector for the system 
in the t^^ time-slot, where dj is the deviation for Mi. Define x* = (s[, . . . , s^) and X* = {Si, . . . , Sj^), where 
S| and Sj are respectively the t^^ elements of the TSP and cumulative TSP (cTSP) of M-i. To M-i we assign the 
cost function ipi{k), which quantifies the cost of deviation k £ Z. Similar to Section ITl-B[ we assume that ^pi{k) is 
non-negative, convex, and increasing for both k > and A; < 0, and V'o(^) = V /c. Finally, let 

N 

^(d*)^ (19) 

i=l 

denote the sum of deviation costs of all meta-queues. 

We confine our attention to a finite horizon of T time-slots. At the beginning of every time-slot, the scheduler 
selects one of the + 1 meta-queues for service. The configuration vector corresponding to scheduling Mi is e^. 
An admissible policy IT-jp for the single server scheduling problem is a sequence of scheduling decisions {it}f=i, 
corresponding to scheduling meta-queue Mi^ in the t^^ time-slot. Let d~ denote the deviation vector at the 

beginning of the t^^ time-slot under scheduling policy IIt- Our goal is to compute the optimal finite horizon policy 



which satisfies 



T 



n^ = argmin<^}_^^'(d^J^. (20) 

Ht U=1 ^ J 

We specify the state of the system at the end of the t^^ time-slot by 

n* = (n*,...,0, (21) 

where n* is the number of times Mi has been served within the first t time-slots. Since the server is allowed to 
idle, (n*, 1) < t V t. The system state and deviation vector are uniquely related by 

d*+i = 5° + n* - X*. (22) 

If the state at the beginning of the t*^ time-slot is n and the scheduler chooses Mi in the t^^ time-slot, the new 
state at the beginning of the {t + 1)** time-slot is n + Cj. Letting V^{n) denote the cost-to-go at the beginning of 
the t^^ time-slot in state n, we have the following DP equations for t = 1, . . . ,T 

V\n) = _min^ < (n_+ei_) + ^{ n + e. ^ - X* ) I , (23) 

New state New deviation vector ) 

and the boundary conditions y^+^(n) = V n. For notational convenience, define 

n\n) = V\n) +^{n-X.^). (24) 
Also, define the pairwise decision functions 

^ljin)^n'+\n + e,)-n'+\n + ej), i^j. (25) 

It follows that "prefers" Mi over Mj in the t^^ time-slot in state n if 7*j (n) < 0, and "prefers" Mj else. The 
pairwise decision functions satisfy the following: 

Lemma 1 (Monotonicity of ^): 7*j(n) is a non-decreasing function of rii and a non-increasing function of rij for 
i,j £{0,...,N},ij^j, andi = l,...,T. 
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Proof: See Appendix IVIII-BI ■ 

Lemma [T] can be used to show that any two-dimensional subspace of the A^-dimensional state-space is partitioned 
into + 1 connected decision regions by IT^. The states in the i*^ decision region are those in which 11^ schedules 
M.i in the t*^ time-slot. Further, for every t, as n\ increases for fixed n*, j / i, 11^ switches over from M.i to 

Mk for some k ^ i. Thereafter, 11^ never switches back to Mi. Unfortunately, this neat structural insight does 
not immediately yield a low complexity approximation of the optimal policy. 

Example 9: Consider a system with three meta-queues (A^ = 3) and a time-horizon of T = 40 time-slots. Let 
us fix n| = 8 and look at the projection of the three dimensional state-space on the (72^,^2) plane, for t = 30. The 
entries in the TSPs of the meta-queues were generated from an i.i.d. Bernoulli process with parameter p. Fig. 0] 
and Fig. [5] depict the partitioning of the (ni,n2) plane for p = 0.1 and p = 0.3, respectively. Since the TSPs are 
more sparse in the case p = 0.1 (more relaxed deadlines), it is optimal to idle the server in several states. 
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Fig. 4. Partitioning of the {n{,n2) plane into decision regions for fixed = 8,T — 40, t — 30 for p = 0.1. Tlie states in which it is 
optimal to schedule A^o, A^i, M2, and AI3 are depicted by □, o, x, and ★ respectively. 



C. The myopic/greedy policy 

The complexity of computing the optimal policy increases exponentially in both T and N. However, the per 
time-slot complexity of the myopic/greedy policy associated with the problem is only 0{N). The myopic policy 
schedules Aii* in the t*'^ time-slot in state n such that 

i* = argmin{^'(n + - X*)}. (26) 

j=0,...,N 

Once again, the myopic policy for the aforementioned scheduling problem is provably optimal for the case N = 2. 
The proof is similar to the proof of Theorem [T] and is omitted. 
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Fig. 5. Partitioning of the {n\,n2) plane into decision regions for fixed n\ = 9,,T = 40, t 
optimal to schedule Mo, Mi, M2, and Ms are depicted by □, o, x, and ★ respectively. 



30 for p = 0.3. The states in which it is 



D. The Largest Lag First (LLF) policy 

Consider the special case of quadratic cost functions for concreteness. It is easily seen that the myopic policy in 
this case selects M-i* such that _ 

i* ; 2{d-, -xi) + l<0 



(27) 

; else, 



where 



^ argminjdj — x*}. (28) 



j=l,...,N 

The arguments are similar to those given in Example |6] 

We refer to the policy in (|27] ) as Largest Lag First (LLF), because it chooses the meta-queue with the most 
negative deviation (equivalently, largest lag). The LLF policy idles the server if the deviations of all meta-queues 
are non-negative. 



E. Meta-queue based service trace control 

Having studied the salient features of the single server scheduling problem, we revert our attention to the service 
trace control problem for the IQ switch. Suppose that the STC chooses a configuration from subset alone, and 
VOQ deviations are mapped to meta-queue deviations through a mapping T. Let li = {ii, . . . ,i]\f} denote the set of 
VOQs which are served by the i^^ configuration in 5v, namely C*~^(v). These VOQs constitute the i^^ meta-queue 
in the single server system. Given deviation vector d for the switch, we denote the deviation of the i^^ meta-queue 
by r(d;Ij). The LLF poUcy of Section HV-DI then schedules the j*^ meta-queue such that 

j = argmin{r(d;Xj)}. (29) 
i=l,...,N 

Once a meta-queue is chosen, a partial configuration is extracted by idling VOQs with non-negative deviation. We 
now examine two special choices of T. 
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1 ) The MSL-SS policy revisited: Consider r(d; I) = ^ dj ; With this choice of T, we recover the MSL-SS poUcy 

jei 

proposed in Section IIII-BI Thus, the service trace control problem for an IQ switch with single subset operation is 
equivalent to the single server scheduling problem if all cost functions (both for the VOQs and the meta-queues) 
are quadratic and the deviation of a meta-queue is defined as the sum of the deviations of its constituent VOQs. 

2) The LLF-SS policy: Consider r(d;X) = m.m{dj}; For this F, the STC in effect selects the VOQ with the 

largest lag. Since every VOQ is associated with a unique configuration in Sv, selecting a VOQ immediately identifies 
a unique switch configuration. We call this policy the Largest Lag First - Single Subset (LLF-SS) policy. The 
per time-slot complexity of LLF-SS is 0{N'^), since it involves computing the maximum of an unsorted list of N"^ 
numbers. It can be reduced to 0{N) through parallelization or efficient data structures (for maintaining dynamic 
Usts). 

Remark 5: There is a natural interpretation for the above choice of F. Suppose that the switch is operated using 
only complete configurations and has been set in configuration C'^^(v) r times by the end of the t^^ time-slot. 
Denote by S\, the t*^ entry of the TSP of Qi.. It follows that the deviation of the Qi. in the t^'^ time-slot is 
d\, = T — S\,, since C*^-'^(v) serves VOQs indexed by Xj = {ii, . . . ,iN}- Now, define 

S\= max {S\.}. (30) 

3=1,. ..,N 

If configuration C*^^(v) is chosen at least S\ times by the end of the t*^ time-slot (r > S*), all N VOQs indexed 
by set Xi have a non-negative deviation (no lag). We therefore let Sj = [S] , Sf,...) be the cTSP of the meta-queue 
generated by^ grouping VOQs indexed by set Zj. It follows that the deviation of the i*^ meta-queue in the t^^ 
time-slot is = r — max {S** }, i.e., = min {d* }. In words, the deviation of a meta-queue is the m/?i/mMm 

j=l,...,N ^ j=l,...,N ^ 

of the deviation of its constituent VOQs. 

Remark 6: The meta-queue construction provides a general framework for designing service trace control policies 
under single subset operation. While we have illustrated the idea with two specific examples here, different families 
of policies with varying performance tradeoffs can be constructed by appropriately selecting the mapping F, as 
well as the meta-queue selection policy. For instance, we can set F(d;Z) = Cjdj, where {cj} > are weight 

parameters chosen to provide differentiated QoS to VOQs. 



V. Admissible Region and Subset Selection 

Consider a traffic stream with target stream profile (TSP) s and the corresponding cumulative TSP S. The 
average "distance" between consecutive "l"s in s can be interpreted as the average packet inter-departure time 
target associated with this traffic stream. By definition, is the number of "l"s in the TSP in the first t time-slots. 
We assume that the limit 

Qt 

A = lim — (31) 

t^oo t 

exists for every traffic stream, and refer to 1/A as the average packet inter-departure time (IDT) target for the traffic 
stream. Going back to Example |3l where we considered periodic traffic with period 5, we see that S"* = [t/5\ and 
\ = l/5. 

A larger A implies smaller IDT targets on an average, which means the stream requires more service from the 
switch. Thus, A can be thought of as the load imposed by a stream on the switch. Letting A^ denote the load 
imposed by the stream at the i*^ VOQ, we define 

A^(Ai,...,AAr.) (32) 

as the load vector for the switch. We now consider a special case where the IDT targets for the traffic stream 
associated with the i^^ VOQ are geometrically distributed^ with parameter Aj € (0, 1). Equivalently, every entry 
in the TSP of Qi is an independent identically distributed (i.i.d.) Bernoulli random variable d with mean Aj. We 

^For a geometrically distributed random variable X with parameter p, the probability mass function is given by P[X = k] = (1 — 

^For a Bernoulli random variable X with parameter p, the probability mass function is given by P[X = 0] = 1 — p and P[X = 1] = p. 
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refer to this scenario as i.i.d. loading. Further, we refer to the scenario Aj = A, V i, i.e., A = (A/A^^)l as uniform 
i.i.d loading Using the notation introduced in Q, the i.i.d. assumption implies: 

E[x*] = AVt 
E[x*a;*] = Xi if j = i and E[x*a;*] = Oif j ^i, V t 

E[x*xJ] = Oy i,j if T^t. (33) 

Next, we define the admissible region as the set of all load vectors for which some service trace control policy 
guarantees finite lags to all VOQs, at all times. Alternatively, if the switch is subject to a load vector not contained 
in the admissible region, the lag of at least one VOQ grows without bound, regardless of the service trace control 
policy employed. The admissible region for an IQ switch is given by 

TV N 
j=l j=l V ) 

i = l,...,N, AiG(0,l), i = l,...,N^}. 

A policy which ensures finite lags for all VOQs for all load vectors A G A is said to be 100% admissible. More 
formally. 

Definition 4: A policy 11 is 100% admissible if V A € A, liminf E[(i-] > -oo V i, where the notation E^[-] 
implies that the expectation is computed under policy 11. As a special case, a policy 11 is 100% admissible under 
i.i.d. loading if it satisfies the aforementioned property for all i.i.d. load vectors in A. 

Theorem 2: liminf E[(i*] > — oo V i under the MSL(^) policy, for any admissible i.i.d. load. 

Proof: See Appendix |yiIL£| ■ 

We are now ready to answer the two questions raised at the end of Section |lll] regarding the efficacy of subset 
based control. Our answer to the first question is that by restricting operation to a single subset, not all load vectors 
in A can be supported. However, all uniform loads can be supported. In particular. 

Theorem 3: liminf E[d*] > —oo V i under the LLF-SS policy, for any admissible uniform i.i.d load, independent 

t—^oo 

of the choice of operational subset. 

Proof: See Appendix IVIII-DI ■ 
Theorem 4: liminf E[(i*] > — oo V i under the MSL(^)-SS policy, for any admissible uniform i.i.d. load, 
independent of the choice of operational subset. 

Proof: See Appendix IVIII-El ■ 

It must be noted that there exists a non-empty subset of non-uniform load vectors in A (even near the "boundary" 
of A) under which LLF-SS guarantees bounded lags to all VOQs, if the operational subset is suitably chosen. 

Example 10: Consider A = (1 — e,0, 0,0, 0, 1 — e, 0, 1 — e, 0) for a 3 x 3 switch, for some e € (0,2/3). In 
this case, operating LLF-SS with the first subset (generated by vi) in Fig. [3] cannot guarantee bounded lags to all 
VOQs, while operating the same policy with the second subset (generated by V2) can guarantee bounded lags. 

It is possible to construct non-uniform i.i.d. load vectors in A under which LLF-SS cannot guarantee bounded 
lags to all VOQs, irrespective of the choice of the operational subset. 

Example 11: Consider A = (c, 0, 0, 0, c/2, c/2, 0, c/2, c/2) where c = 1 - e for some e G (0, 1/2). The LLF-SS 
policy cannot guarantee bounded lags to all VOQs, regardless of the choice of operational subset. 
Similar examples can be constructed for the MSL(£)-SS policy. 



A. Randomized subset selection 

As we saw in the previous section, service trace control based on single subset operation is not enough to support 
all admissible loads. However, subset based operation in conjunction with an appropriate subset selection policy 
can achieve the desired goal. We propose one such subset selection policy in this section. To this end, denote the 

''The theory developed here can be extended to the case where TSP entries are generated from a Markov modulated Bernoulli process by 
considering multi-step drifts of the Lyapunov function (see, for example, [24].) 
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Policy 


Brief description 


Complexity 


100% admissibile 


Knows A 


MSL 


Pick configuration (from V) with maximum sum of VOQ lags 




/ 


X 


MSL-SS 


Pick configuration (single subset) with max sum of VOQ lags 




X 


X 


LLF-SS 


Pick configuration (single subset) with most lagged VOQ 


0(N ) 


X 


X 


MSL-RS 


Randomized subset selection + MSL-SS 


0{N'^) 


/ 


/ 


LLF-RS 


Randomized subset selection + LLF-SS 


0{N'^) 


/ 


/ 


MSL-pSEL(P) 


Periodic subset selection (with period P) + MSL-SS 


0{N'^) 


/ 


X 


LLF-pSEL(P) 


Periodic subset selection (with period P) + LLF-SS 


0{N'^) 


/ 


X 



TABLE I 

Key properties of some service trace control policies proposed in the paper. 



k^^ configuration vector by and the coiTcsponding generated subset by = {C^{^k)}f=Q^ ■ Consider the Birkoff 
von Neumann (BV) decomposition [13] of load vector A E A given by 

(Ar_l)!;V-l {Ar-l)!Ar_i 
k=l i=0 k=l 4=0 

Define a probability distribution on the subsets by 

9k = -Y,Cik, k = l,2,...,iN -1)1 (36) 
^ i=0 

Now, consider the following two-step service trace control policy, namely Maximum Sum of Lags - Random 
Subset (MSL-RS), which combines the MSL-SS policy of Section IIII-BI with the notion of randomized subset 
selection: 

1) Select configuration subset with probability 6f^. 

2) Select a configuration from based on MSL-SS. 

The computational complexity of MSL-RS is 0{N'^) per time-slot, since MSL-SS has complexity 0{N'^) and the 
BV decomposition of A contains at most N"^ — 2N + 2 non-zero terms [13]. 

Theorem 5: liminf E[(i*] > -oo Mi under MSL(^)-RS, for any admissible i.i.d. load. 

Proof: See Appendix IVIII-FI ■ 

The Largest Lag First - Random Subset (LLF-RS) policy is constructed analogously, by combining the idea of 
randomized subset selection with the LLF-SS policy of Section ITV-E.21 Finally, note that the MSL-RS and LLF-RS 
policies can be extended to construct the MSL(^)-RS and LLF(£)-RS families of policies, respectively (discussed 
in Remark (3]). These policies allow traffic streams to enjoy a lead of up to ^ > 0, instead of idling them when they 
are not lagging. 



B. Periodic subset selection 

1) The pSEL(P) rule: The MSL(£)-RS policy proposed in the previous section can support all admissible loads 
and has low computational complexity. However, the policy requires a priori knowledge of the load vector A. We, on 
the other hand, are interested in designing robust control policies which do not rely on statistical knowledge about 
the input traffic streams. Thus, to eliminate dependence on A, we propose the following periodic subset selection 
rule: Suppose the switch is currently being operated using configuration subset Sv Every P > time-slots, a 
complete configuration v* is selected, based on some service trace control policy. If v* € 5v, the switch continues 
to operate with configuration subset 5v, otherwise the switch starts operating in the configuration subset generated 
by V*, viz., 5v* = {v*,C(v*), . . . ,C^^^(v*)}. Once a configuration subset has been selected, the switch can be 
operated using any subset based service trace control policy (e.g. MSL-SS). We refer to this subset selection rule 
as pSEL(P) (Periodic Selection with period P). 
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2) The MSL-pSEL(P) policy: We combine the pSEL(P) subset selection rule with the MSL policy of Section 
III-EI and the MSL-SS policy of Section IIII-BI to propose the Maximum Sum of Lags - Periodic Selection (P) 
(MSL-pSEL(P)) service trace control policy. Every P time-slots, the MSL-pSEL(P) policy computes the switch 
configuration v* based on the MSL policy. If v* is in the current operational subset, the MSEL-pSEL(P) policy 
continues to operate the switch using the current subset, otherwise it switches to the configuration subset generated 
by V*, viz., {v*,C'^v*), . . . ,C^~^(v*)}. In the intermediate P — I time-slots, MSL-pSEL(P) operates the switch 
using the MSL-SS policy. 

The per time-slot complexity of the MSL policy is 0{N^). This computation needs to be done every P time-slots 
to update the configuration subset. The complexity of the MSL-SS policy is 0{N'^), as discussed in IIII-BI The MSL- 
SS policy needs to be executed in the P — 1 intermediate time-slots between configuration subset updates. Thus, the 
computational complexity of MSL-pSEL(P) is O [N^/P + (1 - l/P)N'^). If P = 0{N), i.e., if the configuration 
subset is updated roughly every N time-slots for a switch of size N x N, the complexity of MSL-pSEL(P) is 

MSL-pSEL(P) has all the desired traits - a manageable complexity of 0{N'^), no dependence on load vector A, 
and as Theorem |6] tells us, it is 100% admissible under i.i.d. loading. 

Theorem 6: liminf E[(i-] > -oo V i under MSL-pSEL(P), for any admissible i.i.d. load. 

Proof: See Appendix IVm-GI ■ 

We close this section by noting that the Largest Lag First - Periodic Selection (P) (LLF-pSEL(P)) poUcy is 
constructed by combining the LLF policy (Section [iV-DI l and the LLF-SS policy (Section [IV-E.21 ) with the pSEL(P) 
rule. 

VI. Performance Evaluation 

In this section, we present simulation results to characterize the performance of the proposed service trace control 
policies. 

A. Simulation setup 

All results presented here are for a 16 x 16 IQ switch. We contrast the performance of the following policies: 

• Maximum Sum of Lags (MSL): This policy was proposed in Section ITl-EI MSL computes the maximum weight 
matching (with VOQ lags as edge weights) over all possible switch configurations (set of size A^!), and will 
be the benchmark for all other lower complexity policies. 

• Maximum Sum of Lags - Single Subset (MSL-SS): This policy was proposed in Section ITlI-B I MSL-SS computes 
the maximum weight matching over one configuration subset only (using VOQ lags as edge weights). 

• Largest Lag First - Single Subset (LLF-SS): This policy was proposed in Section ITV-E.21 LLF-SS operates on 
a single configuration subset, and picks the VOQ (and hence the configuration, since each VOQ is associated 
with a unique configuration within a subset) with the largest lag. 

• Maximum Sum of Lags - Periodic Selection (MSL-pSEL(16)): This policy was proposed in Section IV-B. 21 The 
behavior is similar to MSL-SS, except that the underlying operational configuration subset is updated every 
16 slots. 

• Largest Lag First - Periodic Selection (MSL-pSEL(16)): This policy was proposed in Section rV-B.2| The 
behavior is similar to LLF-SS, except that the underlying operational configuration subset is updated every 16 
slots. 

Salient features of the above policies are enumerated in Table Jl All policies require a two-step implementation. 
A complete switch configuration is selected in the first step and all VOQs with non-negative deviations are idled 
to extract a partial configuration in the second step. Thus, no VOQ can lead under any of the policies under 
consideration. We also simulated the performance of policies which allow VOQs to acquire a lead of up to £ > 0, 
but did not observe any relative difference in the performance of different policies for fixed £. 
We consider the following two performance metrics: 

• Average Deviation: Empirical mean of VOQ deviations, averaged over all 16^ = 256 VOQs. 

• Variance: Empirical variance of VOQ deviations, averaged over all 256 VOQs. 
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Since no VOQ can lead in our simulation setup, the deviation of every VOQ (and hence the average deviation for 
every policy) is upper bounded by zero. 

Clearly, we want both the mean and variance of the deviations to be close to zero, for all traffic streams. A mean 
close to zero with large variance is not sufficient, since it indicates severe instantaneous positive/negative distortion 
of the target profiles. In other words, it implies that the output of the switch is not "smooth". This is not ideal 
from a flow control perspective, since a bursty output stream from the switch makes scheduling and buffering at 
downstream switches harder. Also, a small variance with a large non-zero mean is undesirable, since it indicates 
that one or more VOQs are missing their deadlines frequently. 

Remark 7: Ideally, all policies should be benchmarked relative to the optimal service trace control policy, which 
is computed by solving the DP equations in (fTTI ). However, the complexity of evaluating the optimal policy grows 
exponentially with the size of the switch, viz. N and the length of the time horizon of interest, viz. T. In our setup, 

= 16 and T = 50, 000. The complexity of solving the DP equations for a problem of this magnitude is simply 
prohibitive. We therefore resort to the next best option, i.e., using the myopic policy (MSL) as a performance 
benchmark. Recall that we have analytically proven the optimality of the myopic policy for A^ = 2 (see Theorem 
[T]) and numerically verified that it is "close" to being optimal for A^ = 3, 4. Note that even the myopic policy is 
quite expensive to implement, since it entails computing a maximum weight matching in every time-slot. 

B. Discussion of simulation results 

We report simulation results for four distinct loading scenarios. Every point on the performance curves depicted 
here was generated by averaging over 50,000 time-slots. 




0.3 0.4 0.5 0.6 0.7 0.8 0.9 

Load per input port 




Load per input port 

Fig. 6. Performance of MSL, MSL-SS and LLF-SS under uniform i.i.d. loading for an 16 x 16 switch 

1) Uniform i.i.d. loading: In this case, all streams have geometrically distributed inter-departure time targets with 
parameter A/ 16. The load per input port is therefore 16 x A/ 16 = A (number of VOQs x load per streamA^OQ). 
The performance of the MSL, MSL-SS, and LLF-SS policies is depicted in Fig. |6l The results show that operating 
the switch with a single subset works quite well if the switch is uniformly loaded, especially up to ~ 60% loading 
of the switch. The loss in performance vis-a-vis the MSL policy comes with a significant reduction in complexity. 
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Fig. 7. Performance of MSL, MSL-SS and LLF-SS under parallel-heavy i.i.d. loading for an 16 x 16 switch 



Recall that the MSL policy has to perform a full-blown maximum weight matching computation in every time-slot. 
Subset selection is not needed in the uniform loading scenario (Theorems [3] and H]). It suffices to operate the switch 
using only one configuration subset, which can be chosen arbitrarily. 

2) Parallel-heavy i.i.d loading: In this case, all traffic streams associated with "parallel" VOQs have geometrically 
distributed inter-departure time targets with parameter Ai and "diagonal" VOQs have geometrically distributed inter- 
departure time targets with parameter A2 < Ai. We call a VOQ parallel if it buffers packets destined from the i^^ 
input port to the i^^ output port for i € {1, ... , 16}, and call it diagonal otherwise. Thus, a 16 x 16 switch has 16 
parallel VOQs and 16^ — 16 = 240 diagonal VOQs. We fixed A2 and varied Ai to vary the load per input port, viz. 
Ai -|- I5A2. Performance results are depicted in Fig. |7] For MSL-SS and LLF-SS, we selected the subset generated 
by the configuration which concurrently serves all 16 parallel VOQs. Since this configuration needs to be selected 
frequently, especially as Ai increases, single subset operation based policies perform quite well in this non-uniform 
loading scenario. Note that subset based policies would perform poorly in this scenario if the configuration subset 
is not selected appropriately. Good performance can however be achieved by combining subset based operation 
with the periodic subset selection rule, as illustrated by the next set of simulation results. 

3) Cross-heavy i.i.d. loading: Once again, all traffic streams have geometrically distributed inter-departure time 
targets. However, VOQs which buffer packets destined from input port i to output port i + I (for odd i) and from 
input port i to output port i — 1 (for even i) are more heavily loaded (parameter Ai) than other VOQs (parameter 
A2 < Ai). We allude to this scenario as cross-heavy loading because the configuration which serves the more heavily 
loaded VOQs forms a criss-cross pattern with eight "crosses" (x). We fixed A2 and varied Ai to vary the load per 
input port, viz. Ai -1- I5A2. For MSL-SS and LLF-SS, we used the same configuration subset as for the parallel- 
heavy i.i.d. loading experiment. Since the cross pattern is not contained in this subset, the performance of single 
subset based policies degrades severely as Ai increases, and is therefore not depicted here. However, periodic subset 
selection, in conjunction with single subset operation delivers performance quite close to the benchmark MSL policy, 
especially for switch loading up to ~ 65%. Recall that this performance is achieved at a much lower computational 
complexity. The results are depicted in Fig. [S] The MSL-SS and LLF-SS policies would have performed well if 
we had chosen the configuration subset which serves the most heavily loaded VOQs (the ones forming the cross 
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Fig. 8. Performance of MSL, MSL-pSEL(16) and LLF-pSEL(16) under cross-heavy i.i.d. loading for an 16 x 16 switch 



pattern) as the operational subset. This is exactly what we had done in the diagonal loading scenario. However, it 
is not always possible for the switch controller to have a priori information about the loading pattern. Subset based 
operation therefore needs to be combined with a subset selection rule to ensure good performance in unknown 
loading scenarios. 

4) Uniform periodic loading: In this case, all traffic streams have identical inter-departure time targets equal 
to 5 (see Example |3]l. The streams are offset relative to each other, i.e., the TSPs of all streams are time-shifted 
versions of each other. For instance, suppose the desired departure times of packets from one of the stream are 
(r, r + (5, r + 2(5, . . .) for some r G Z+, 5 € N. The desired departure times of packets from another stream traversing 
the same switch could be (r + r', r + r' + J, r + T' + 2(5, . . .) for some r' G Z+. Both streams are periodic with IDTs 
equal to 6, but are offset by r' slots with respect to each other. We generated the offsets uniformly at random from 
the set {0,1, ... ,6 — 1} and varied 6 from 18 to 36 slots to vary the load per input port, viz. 16/6. The performance 
results are depicted in Fig. |9l The efficacy of single subset based operation under uniform switch loading is evident 
from the plots. For instance, even at 80% loading of the switch, the average deviation and the variance of the 
deviation under the MSL-SS policy are -0.3 and 0.2, respectively. This means that on an average, the received 
service traces for all 256 VOQs deviate from the target stream profiles by only 0.3 time-slots, with a standard 
deviation of fa 0.45 time-slots. Such small negative deviations with small variances can easily be corrected for by 
allowing traffic streams to gain a lead of 1-2 time-slots (e.g. by using the MSL(^)-SS policy). Roughly speaking, 
using MSL(^)-SS adds £ time-slots to the average deviation, without impacting the variance. For £ = 2, the average 
deviation for the specific example discussed above would be 1.7. With a standard deviation of 0.44, the probability 
of missing a deadline (target departure time) will be minimal, even at ~ 80% loading of the switch. 

We also evaluated the performance of the proposed policies under non-uniform periodic loading and Markovian 
modulated Bernoulli loading (entries of the target stream profile were generated from an MMB process, instead of 
an i.i.d. Bernoulli process). The performance was observed to be more or less invariant to the statistics of the input 
traffic streams, underlining the robustness of the proposed policies. 

Remark 8: Our simulation results demonstrate that it is possible to render IQ switches nearly transparent to 
deadline sensitive traffic streams by minimizing the distortion of their target profiles. Moreover, this can be 
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Fig. 9. Performance of MSL, MSL-SS and LLF-SS under uniform periodic loading for an 16 x 16 switch 



accomplished with low complexity online scheduling policies, and with no prior knowledge of the input traffic 
statistics. For the proposed policies, the transparency of the switch is particularly strong under moderate loading 
(< 60%), which is a very relevant regime, since a switch is unlikely to be loaded to capacity with real-time traffic. 
For example, at ~ 50% loading, for all four loading scenarios simulated here, the average deviation from the target 
stream profile under all proposed policies is no more -0.3, with a variance below 0.2. Moreover, the proposed policies 
do not require any prior knowledge of the statistics of the input traffic. For instance, they do not require the streams 
to be periodic or to be constrained by a leaky bucket, as long as the offered load is within the admissible region of 
the switch. Finally, the 0{N'^) complexity of the proposed policies render them more amenable to implementation 
in high performance packet switches vis-a-vis other schedulers proposed in the literature for multiclass periodic 
traffic, which have a computational complexity of 0{N'^) or 0{N^), in addition to their high implementation 
complexity. 

VII. Conclusion 

We examined the problem of packet switch scheduling for minimizing aggregate distortion of outflow traffic 
streams with respect to target packet inter-departure times. The study was initially motivated by the need to 
provide QoS for real-time multimedia traffic over packet networks. The notion of switch configuration subset 
based control was leveraged to design robust, low complexity, near optimal schedules amenable to implementation 
in high performance packet switches. Such schedules have been shown to achieve 100% pull-throughput under 
certain natural statistics of target profiles. 

Many theoretical questions remain open, including the pull-throughput region of the switch under general target 
profile statistics. Moreover, sweeping experimentation is needed to scope out the design and performance of such 
switches and schedules in broad, diverse target profile regimes. 

VIII. Appendix 

A. Proof of Theorem [7] 

Proof: For ease of exposition, we only treat the case where the switch is never idled. A 2 x 2 switch can be 
set into two possible configurations, vi = (1 1) and V2 = (0 1 1 0). Given initial state d'', we say that state d is 
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reachable in the t time-slot if there is a sequence of configurations which drive the switch to state d in t time-slots. 
The reachable states in the t^^ time-slot constitute the set 7^* = {d^ = d° + kwi + {t - k)v2 - ^^}k=o- ^^^^^ 
d| is reached if the STC selects configuration vi in k of the first t time-slots and V2 in remaining t — k time-slots. 
The states reachable in the {t + 1)** time-slot given state d^ in the t^^ time-slot are d^ + vi — x*+^ = d^^^^^ and 
+ V2 — x*+^ = d^^^. Equivalently, we can identify the state in the t*^ time-slot by the index k, which increases 
by 1 in the next time-slot if vi is chosen and remains unaltered if V2 is chosen. 
Observe that d^. is a sum of two components: 

1) d° + fcvi + [t — k)v2, the evolution of which is determined by the service trace control policy 

2) —X*, the evolution of which is determined by the inter-departure time targets of the input streams. 

The first component can be represented using a directed acyclic graph (DAG). The root of the DAG is at d°. Nodes 
of the DAG at depth t correspond to the policy dependent component of the reachable states in the t*^ time-slot. 
There are t + 1 nodes at depth t, ordered in increasing order of index k from right to left (Fig. [TOl ). The optimal 
policy traverses the least cost path from the root to one of the leaves. 

We use C(d^,) = i to denote that chooses configuration Vj in the t^^ time-slot in state corresponding to 
index k. Also, we define 

Jl*(d) ^ y*(d) + $(d). (37) 

7*(d) ^ ^'+\d + VI - X*) - n'+\d + V2 - X*). (38) 

The quantity 7*(d) is interpreted as the decision function in state d in the t^^ time-slot, i.e., C(d*) = 1 (configuration 
vi selected) if 7*(d) < 0, and C(d*) = 2 (configuration V2 selected) otherwise. 

We need four auxiliary results to prove the optimality of the myopic policy. The proofs of the auxiliary results 
are omitted due to space constraints. 

1) Auxiliary result 1 (ARl): For any state d, <I>(d + vi - V2) - <I>(d) > <I>(d) - <i>(d - vi + V2). 

Proof. Recall from Section ITl-B I that the cost functions 0j( ) associated with the VOQs are convex. Thus, for the 
j^th YOQ with deviation di, the following holds - (t)i{di + 1) — 4>i{di) > (j)i{di) — 4>i{di — 1). Further, from[8l the 
cost function <I> is the sum of cost functions of all VOQs. Combining the convexity of (pi with the definition of ^, 
we arrive at the desired result. 

2) Auxiliary result 2 (ARl): For t = 1, T and any state d, y*(d+vi-V2)-y*(d) > y*(d)-y*(d-vi+V2). 
Proof. The proof is based on inductive arguments. Base case: For the base case, t = T, it follows from (fTTl ) and 

the boundary conditions y^+^(d) = that 

V^{d) = mill {«>(d + vi - x^), $(d + vi - x^)} . (39) 
Suppose the result is true for some t < T, i.e., 

V\d' + vi - V2) - V\d') > V\d') - V\d' - vi + V2) V d'. (40) 
We will show that the (l40l ) implies that the result is true for t + 1, i.e., 

y*+i(d' + VI - V2) - v'+Hd') > v'+\d') - v'+\d' - VI + V2) V d'. (41) 

Setting d' = d + vi — X* in (l40l ) and invoking ARl, we get 

0*+l(d + 2V1 - V2 - X*) - Jl*+l(d + Vi - X*) > 17*+^ (d + Vi - X*) - r?*+l(d + V2 - X*). (42) 

Similarly, setting d' = d + V2 — x* in (l40l) and invoking ARl, we get 

n^+\d + vi - X*) - n^+\d + V2 - X*) > o*+i(d + V2 - X*) - n^+\d + vi - x*). (43) 

It follows from the definition of 7*(-), (|42l), and (|43]l that 

7*(d + vi - V2) > 7*(d) > 7*(d - vi + V2). (44) 
In view of (l44l ). four distinct cases arise: 
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Fig. 10. A DAG based representation of the reachable states in successive time-slots, for a 2 x 2 switch. The solid lines represent choice 
of configuration vi, while the dotted lines represent choice of configuration V2. The inset depicts the evolution of state index k, based on 
the choice of configuration in a state. 



. > 7*(d + vi-V2) > 7*(d) > 7*(d-vi + V2): In this case, C*(d-vi + V2) = C*(s) = C*(d + vi-V2) = 1. 
We have, 

y*(d + Vi-V2) = J7*+I(d + 2V1 - V2 -X*) 

V\d) = rj*+i(d + vi -X*) 

y*(d-Vi+V2) = r?*+l(d + V2-X*) 

The result in (|4TI ) now follows from the (l42l ) and the above set of equaUties. 
. 7*(d + vi-V2) > 7*(d) > 7*(d-vi + V2) > 0: In this case, C*(d-vi+V2) = C*(s) = C*(d+vi-V2) = 2. 
We have, 

y*(d + vi-V2) = 17*+i(d + vi -x*) 

V\d) = J7*+l(d + V2-X*) 
y*(d-Vi+V2) = J]*+I(d-Vi + 2V2-X*) 

The result in ((4TI) now follows from the (|43] ) and the above set of equalities. 
. 7*(d + vi - V2) > > 7*(d) > 7*(d - vi + V2): In this case, C*(d - vi + V2) = C*(s) = 1 and 
C*(d + vi - V2) = 2. We have, 

y*(d + Vi-V2) = J^*+l(d + Vi-X*) 

V\d) = J7*+i(d + vi -X*) 

y*(d-Vi+V2) = J7*+l(d + V2-X*) 

The result in (|4T]) now follows from the above set of equalities, the definition of 7*( ), and the assumption 
that 7*(d) < 0. 

. 7*(d + vi - V2) > 7*(d) > > 7*(d - vi + V2): In this case, C*(d - vi + V2) = 1 and C*(s) = 
C*(d + vi - V2) = 2. We have, 

y*(d + vi-V2) = !^*+^(d + vi -X*) 
y*(d) = !^*+i(d + V2-x*) 

y*(d-Vi+V2) = f]*+l(d + V2-X*) 

The result in (|4T]) now follows from the above set of equalities, the definition of 7*(-), and the assumption 
that 7*(d) > 0. 

Since the four cases considered above are mutually exhaustive, the proof is complete. 
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3) Auxiliary result 3 (AR3): For t = 1, . . . ,T, 3 £ {0, . . . ,t} such that C(d^) = l\f k <kt and C(d^) = 

2 V A; > kl 

Proof: Adding the results of ARl and AR2, and invoking the definition of ri(-), we get 

n\d + VI - V2) - n\d) > n\d) - n\d - vi + va). (45) 

Combining (1451 ) with the definition of 7(-), it follows that 7*(d + vi — V2) > 7*(d). Finally, from the definition 
of d|,, we get dj,^^ — = vi — V2. This implies that 7*(d|._(_]^) > 7*(d|,), i.e., the decision function 7*(d|.) is 
a non-decreasing function of k. The proof is based on inductive arguments. We skip the details here. Now, recall 
that the optimal configuration in state d at time t is completely determined by the sign of 7*(d). For fixed t, 
7*(d^) can change sign at most once as k increases from to t (by virtue of its monotonicity). In other words, 

3 kt G {0, 1, . . . ,t} such that 7*(d*^) < ^ k < k^ (implying C(d*) = 1) and 7*(d*) > V A; > A;^* (implying 
C(4) = 2). 

4) Auxiliary result 4 (AR4): argminr2*(d^) = k*. 

k=0,...,t 

Proof: We first show that argminQ*(d^) = k^. The desired result of AR4 then follows directly. It follows from 

k=0,...,t 

ARl that 

v\di^,) - v\di) > y*(d* ) - v\di_,). (46) 

By definition of k^, C{d{.^^) = 2 and C(d*.,) = 1. Thus, y*(d*^.^J = V\dl.) = 17*+i(d*^t^ ). Setting k = k^ 
and then fc = A;^* + 1 in we get V\dl,^^) > y*(d^._^^) = F*(d|,,) > F*(d^._^). Inductively, we conclude 
that y*(d^*) is non-increasing for k < k* and non-decreasing for k > k*, as desired. 
Equipped with our auxiliary results, we will show that 

argmin{17*+i(d + vi -X*), 17*+\d + V2 - x*)} = 
argmin{$(d + vi - x*), ^{d + V2 - x*)}, 

which implies that the myopic policy is optimal, because the left side of (|47] ) is the decision of 11^ while the right 
side is the decision of the myopic policy in state d in the t^^ time-slot. 

Say the switch is in state d in the t*^ time-slot and C(d) = 2. The states reachable from d in the {t + 1)** 
time-slot are di = d + vi — x* and d2 = d + V2 — x*. Four cases arise, depending on whether C(di) and C(d2) 
are 1 or 2. 

5) C(di) = 2, C(d2) = 1.- Since y^+^(d) = V d, the result is trivially true for t = T. Let us consider t < T. 
It follows by definition that y*+i(di) = f]*+2(di + V2 - x*+i) and V^+^{d2) = J^*+2(d2 + vi - x*+i). However, 
di + V2 - x*+i = d2 + vi - x*+i = d + vi + V2 - X* - x*+i. Thus, f]*+i(di) - J7*+l(d2) = $(di) - «>(d2), 
implying (|47] ). 

6) C(di) = 2, C(d2) = 2: Again the result is trivially true for t = T. Let us consider t < T. Several possibilities 
can arise. Since C(d2) = 2, the state in the {t + 2)^'^ time-slot is d2 + V2 — x*+^. The next state is determined by 
the optimal choice of configuration in state d2 + V2 — x*"*"^ in the {t + 2)"'^ time-slot, and so on. In general, we 
can construct a chain of states which the switch visits under 11^, starting in state d2 in the (t + 1)''* time-slot. The 
chain terminates for one of the following two reasons: 

1) The end of the time horizon T is reached. 

2) A state is reached where the optimal choice is vi. 

For all states constituting the chain except possibly the last, the optimal configuration is V2. The optimal configu- 
ration in state di in the {t + 1)** time-slot is also V2. Thus, we can construct a similar chain of states originating 
at di, which terminates for one of the two reasons cited above. The chain originating in state di comprises of the 
states of the type t\ = di + jv2 + X* — X*'^-' (j = 0, 1, . . .) and the chain originating in state d2 comprises of the 
states of the type = d2 + jv2 + X* - X*+J' (j = 0, 1, . . .). 

AR3 implies that the chain originating in state di cannot terminate before the chain originating in state d2 due 
to the reason 2. ARl implies ^{t\) — ^{t^) < $(di) — <I>(d2) = 5$. If both chains terminate due to reason (1), we 
can show < r2*+^(di) — il*+^(d2) < (T — t + 1)(5$, thereby implying WT\ . Now, suppose the chain originating in 
state d2 terminates in the r*'^ time-slot (r < T) due to reason (1). We have two further sub-cases: (i) C(t^~*) = 2 
and (ii) C(tp*) = 1. For sub-case (i), we can show < il*+i(di) - il*+i(d2) < (r - t + 1)(5$, thereby implying 
Wh . Sub-case (ii) cannot arise because we reach a contradiction by virtue of AR4. 
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7) C(di) = l,C(d2) = 2; This case violates AR3 and therefore cannot arise. 

8) C(di) = 1, C(d2) = I.' This case leads to a contradiction similar to the one obtained in sub-case (ii) of case 
(2) and therefore cannot arise. 

By considering a set of collectively exhaustive cases we have shown that (|47l ) holds when C(d) = 2. Analogous 
arguments can be constructed for the case C(d) = 1. It follows that the optimal finite horizon policy for = 2 is 
myopic. ■ 

B. Proof of Lemma |7] 

Proof: The proof is by induction. We will establish monotonicity of as a function of nj. The proof for 
monotonicity of 7jj as a function of Uj follows similarly. 

1) Base Case (t = T): By definition of ^fj and our choice of boundary conditions, 7,^(n + ei) — 7j^(n) = 

ipi{ni + 2 — Xj) — 2ipi{ni + 1 — Xj)] + 'il)i{ni — Xj) > 0, where the inequality follows from the convexity of ipi. 

2) Inductive Step: Now, assume that 7*^^(n + e^) > 7*^^^(n) for some t < T and i / j. We will establish that 
this assumption implies 7|j(n + ej) > 7*j(n). By definition, 

jfjin + ei) = + 2ei) - n*+\n + ei + ej) 

7*^.(n) = Q'+\ii + ei)-Q'+\n + ej). (48) 

Several cases arise, depending on the optimal decision in states n + 2ej, n + + e^, n + and n + ej in the 
{t + ly^ time-slot. Our inductive assumptions imply that the (n*^^, n*~''^) plane gets partitioned into A^ + 1 distinct 
connected decision regions by the optimal policy H^. Consequently, as n*^^ increases for fixed ^ i}, 

switches over from Mi to A4k for some k ^ i. Thereafter, never switches back to A4i. This greatly restricts 
the number of possible cases we need to consider. We will illustrate a representative case where all four states are 
in the interior of the decision region corresponding to M-k- AH cases in which one or more of states of interest 
occur at the boundary of two decision regions can be treated as a combination of the representative cases. It follows 
from dHJ, 

n^^\ii + 2ei) = J7*+2(n + 2ei + e^) + ^'(n + 2ei - X*+^) 
S7*+^(n + e.; + e^) = n^+'^{n + + ej + e^) + 1'(n + + - X*+^) 
f]*+i(n + e,) = n^+^{n + + e^) + ^(n + e, - X*) 
f]*+^(n + ej) = f]*+2(n + + e^) + ^(n + - X*) 

It follows that 

74-(n + e^) = 7*+i(n + e, + e^) + (n + 2e, - X*+i) - ^'(n + e, + e,- - X*+i) 

llj (n) = 7*+' (n + efc) + M/(n + e, - X*) - ^'(n + - X*) (49) 
gammaijin + = 7*/^(n + + e^) + ^'(n + 2e, - X*+^) 
- ^'(n + e, + ej - X*+i) 



7*,(n) = 7*+i(n + e,) + ^(n + e, - X*) 



(50) 



- ^'(n + ej - X* 

Also, we have from the base case. 



Combining, we get 



75(n + ei) = ^{n + 2e, - X*+i) - ^'(n + e, + ej - X*+^) (51) 
jfjin) = ^-(11 + e, - X*) - ^(n + ej - X*) (52) 

74- (n + e.) - 7*j(n) = 7*/'(n + + e^) - ^l+\n + e^) 



>0 by our inductive assumptions 

■7,^(n + ei)-75(n) >0, 



>0 by our base case 



as desired. 
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C. Proof of Theorem |2] 

Proof: Define the time and state dependent quadratic Lyapunov function £*(d*) = (d*,d*) in state d* in the 
t^^ time-slot. Given d* = d, let v* = argmin{(d — x*, v)}. MSL(£) extracts a partial configuration v* from v* 

VGV 

by idling all VOQs with deviation i or more. 

Given that MSL(£) selects partial configuration v* in the t*^ time-slot in state d, the deviation vector in the 
{t + 1)** time-slot is d*+^ = d* — x* + v*. Now define the conditional one step expected drift of the Lyapunov 
function by 

(^^(d) ^ E (d*+i) - C\d')\d' = d] . (53) 

It follows by definition that 

/:*+i(d*+i) = C\d') + 2{d\Y* - X*) + (v* - X*, V* - X*). (54) 
Conditioning on {d* = d} and taking expectation on both sides of ([54]) we get 

d'cid) = 2(d, v'^) - 2E[(d,x*)] + (v^v*) - 2E[(v^x*)] +E[(x*,x*)]. 
From the linearity of expectation and the definition of the inner product operator, 



TV" 



S'cid) = 2(d, v'^) - 2 J](i,E[4] + (v^v^) 

TV" TV" 



(55) 



Finally, invoking (1331 ). we have 

5'cid) = 2(d,v*) - 2(d, A) + (v^v*) - 2(A, v'^) + (A,l). (56) 

We will now bound each of the above terms individually. 

Note that (v*,v*) < N, since a configuration vector has no more than N ones. Recall that a complete 
configuration has exactly A*" ones, but a partial configuration can have less than N ones, if it idles some of 
the VOQs. Also, note that (A,v*) > 0, since by definition, both the load vector and configuration vector have 
non-zero entries. Plugging these inequalities into (l56l ). we get 



,5*:(d) <2(d,v*)-2(d,A) + iV+(A,l). (57) 
Consider the BV decomposition of load vector A, given by 

TV! N\ 

A = ^afcVfc, ^Qfc = Q<l. (58) 

fc=l k=l 

It follows that 

TV! TV! TV! 

(A, 1) = Y,(^k^k, 1) = XI "fc^^'^' 1) = J] "'^ ■ ^ = (59) 

A;=l k=l k=l 

where (v^, 1) = 1 follows because is a complete configuration. Next, we note that x* G {0, 1}, since all entries 
of the target stream profiles are either or 1. As a result, (d, A) > (d — x*, A). Substituting for A from ( [58] ). 

TV! 

(d,A) > (d-x*,A) = J]afc(d-x*,vfc). (60) 

k=l 

The definition of the MSL service trace control policy implies 

(d-x*,v*) < (d-x*,v) Vv G V. (61) 
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Combining ( |60l ) and i6\l , 

m 

(d,A) > J]afe(d-x*,v*) = a(d-x*,v'^). (62) 

k=l 

Now, summing up both sides of (|6T]) V v S V and using the fact that V = A^!, 

A^!(d-x*,v*) < (d-x*,^v) (63) 

vev 

Since each VOQ is served by exactly {N — 1)! complete configurations in V, we have v = {N — l)!l, implying 

vev 

(d-x*,v*)<l(d-x*,l). (64) 

Since all VOQs which are idled under MSL(^) have non-negative updated deviation, (d — x*, v*) < (d — x*, v*). 
Also, (x*, V*) < N, since xj € {0, 1} and (v*, 1) < 1. We now use the these observations and (l62l) in (l57l l to get 

(5^(d) < 2(d,v^)-2(d,A) + iV+(A,l) 

< 2(d - X*, V*) + 2(x*, v"^) - 2a(d - x*, v*) + N + Na 

< 2(d - X*, V*) - 2a(d - x*, v*) + 3N + Na 

< 2(l-a)^(d-x*,l) + iV(3 + a) 

= 2(1 - a)l(d, 1) - 2(1 - a)^(x*, 1) + iV(3 + a) 

< 2(l-a)^(d,l)+iV(3 + a) (65) 

Before we can proceed further, we need the following lemma. 

Lemma 2: If the conditional one-step drift of the quadratic Lyapunov function £*(d) satisfies (^^(d*) < e(d*, 1) + 
B, V t > 0, d* and constants e > 0, i? > (independent of state d*), then 



limsup;^ J] J]E[-dI] < 

Proof: We have, 



B 



6]:{d)<e{d,l)+B. (66) 

Taking expectations on both sides of (l66l) . using the law of iterated expectations, summing up both sides for 
T = 0, . . . , T — 1, assuming d'' = 0, and using £ > 0, we get 

T— 1 

i J](-E[dn,l)<^. (67) 

r=0 

Dividing both sides of (l67l ) by T and taking lim sup, it follows 

T—l 

limsupi J]J]E[-dI]<-. 

T— >-00 n ■ 1 ^ 

T=U 1=1 

■ 

As seen from (l65l) . the Lyapunov drift satisfies the condition of Lemma 2 with e = 2(1 — a)/N > and B = 
N{3 + a) > 0. Since the deviation of any VOQ under the MSL(£) policy is upper bounded by £ > 0, we get from 
Lemma |2] V j 

T— 1 

lim sup - y E[-4] <— + {N^-l)£<oo, (68) 
implying liminf E[(f*] > — oo V j, as desired. ■ 
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D. Proof of Theorem \3\ 

Proof: It can be shown using Lyapunov methods that LLF guarantees finite lags to all meta-queues in the 
single server model of Section IIV-BI if the inverse of the average inter-departure time targets of all meta-queues 
sum to less than 1 . The proof of this result is very similar to the proof of Theorem |2] and is therefore omitted. 
Bounded lags for all meta-queues in the single server model imply bounded lags for all VOQs in the switch, since 
the lag of a meta-queue is the maximum of the lag of its constituent VOQs (see Section IIV-EI ). Under uniform 
i.i.d. loading, i.e., A = Al for some A < the average inter-departure time target for every VOQ, and hence 

for every meta-queue in every configuration subset is N/\. Consequently, the sum of the inverse of the average 
meta-queue inter-departure time targets sum to A < 1, implying the desired result. ■ 



E. Proof of Theorem |?] 

Proof: Consider /3*(d*) and ^£((1), as defined in the proof of Theorem |2l Assume that the switch operates 
in configuration subset 5v = {C^{^)}k=Q ■ Let 

A;*= argmin (d-x*,C''(v)). (69) 

fc=0,...,A'-l 

MSL(£)-SS selects a partial configuration v* which is extracted from the complete configuration C^*(v). Following 
the arguments in the proof of Theorem |2j we get from (l57l) 

5*;(d) < 2(d, v'^) - 2(d, A) + iV + (A, 1). (70) 

As always, we will bound each of these terms individually. Since A = Al for some A < — for uniform loading, 
it follows that (A, 1) = AiV^. Next, it follows from the definition of MSL-SS that 

(d-x*,v*) < (d-x*,C'=(v)) VA: € {0, 1, . . . , iV - 1} (71) 

Summing both sides of the equation over k we get from ( fTT] ) 

N-l 

iV(d-x*,v^) < (d-x*, J]]c'=(v)) = (d-x*,l) < (d,l). (72) 

A:=0 

Using A = Al once again, we get (d, A) = A(d, 1). Since all VOQs which are idled under the MSL(£)-SS policy 
have non-negative updated deviation, (d — x*,v*) < (d — x*,v*). Also, (x*,v*) < N, since x\ G {0,1} and 
(v*, 1) < 1. We now use the these observations and (172] ) in (iTOl ) to get 

(5^(d) < 2(d,v'^)-2(d,A) + iV + (A,l) 

= 2(d - X*, V*) + 2(x*, V*) - 2A(d, 1) + iV + iV^A 

< 2(d-x*,v*) -2A(d,l) + 3iV + iV^A 

< ^(d,l) -2A(d,l) + iV(3 + iVA) 

= 2(l-ArA)^(d,l)+iV(3 + iVA). (73) 

Thus, we have estabUshed ^(d) < e(d,l) + B, for e = 2(1 - N\)/N and B = iV(3 + iVA) > 0. Since the 
deviation of any VOQ under the MSL(£)-SS policy is upper bounded by £ > 0, it follows from Lemma |2] that V j 



T—l 

lim sup - V E[-(i*l < - + (iV^ - 1)^ < oo, (74) 



implying liminf E[d*] > — oo V j, as desired. 

t—>-oo •' 
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F. Proof of Theorem |5] 

Proof: Define (^^(d, A;) as the expected drift in tlie Lyapunov function C*{-) conditioned on the deviation 
vector d and the choice of configuration subset Sk in the t*^ time-slot. If partial configuration v*(A;) derived from 
configuration C**(^)(vfc) € is chosen in the t*^ time-slot, ^^(d, k) is given by (|56l ). with v* replaced by v*(/c). 
Unconditioning with respect to the choice of configuration subset Sk yields 

{N-l)\ 

5*;(d)= ^ 0fc5^(d,A;) 

k=l 

(N-l)\ 



= -2(d,A)-2 V efc(r*(^')K),A) 

(iV-l)! 

+ 2 0,(d,r*('=)(vfc))+(A,l) + iV 



>o (75) 



fc=i 



< -2(d, A) + 2W* + iV(l + a). 

We will bound the terms (d, A) and W* to arrive at the desired result. It follows from the definition of the MSL-RS 
policy that 

(d - X*, C^* W(vfe)) < (d - X*, r (vfc)) V f € {0, 1, . . . , iV - 1}. (76) 
Summing both sides of the equation and invoking ( fTT] ). we get 

(d-x*,C**(^)(v,))<l(d,l) (77) 

Since x* € {0, 1} and a configuration vector has no more than N ones, for any configuration vector v, we have 
(v,x*) < A^. Using this and the fact that {6k} is a probability distribution (implying ^^fc = 1)> we get 

k 

(N-iy. {N-iy. 

0fcr*(')(vfc),x*) <iV< ek-N = N. (78) 

k=l k=l 

From dTTjl and ^ it follows 

(d,r*('^)(vfc)) < l(d,l) + (r*W(vfc),x*) (79) 

Taking the expectation on both sides of the above equation with respect to the distribution {9k} and invoking the 
definition of W*, we get 

W <^{d,l) + N. (80) 

Now, consider the BV decomposition of load vector A as given by ( [35] ). i.e., 

(Af-l)!Ar_l {Ar-l)!Ar_i 

k=l i=0 k=l 1=0 
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Using (A, X*) > 0, the definition of MSL-RS in and the definition of Ok in (|36l ) 

{N-iy.N-i 

(d,A) > (d-x*,A) = J]Cifc(d-x*,C*K)) 

k=l 1=0 

(N-iy.N-i 
> E J]C.fc(d-x*,C^*WK)) 

k=l 1=0 

{N-iy. N-i 
= (d-x*,C^*('=)K)) J]c.fe 

k=l i=0 

(N-iy. 

= C E ^fc(d-x*,C**W(v,)) = CH^* (82) 

k=l 



Substituting and into (|75]l and noting that a = C, 

d£(d) < -2CW* + 2W* + N{l + C) 
= 2(1 - C)VF* + iV(l + C) 

< 2(l-C)^(d,l) + 2iV(l-C) + iV(l + C) 

= 2(l-C)^(d,l)+iV(3-C) (83) 

We have shown that (5^(d) < e(d, 1) + B, for e = 2(1 - C)/iV > and B = N{3 - C) > 0. Since the deviation 
of any VOQ under the MSL(^)-RS poUcy is upper bounded by ^ > 0, it follows from Lemma [2] that V j 



1 ^"^ B 

limsup - Y H-d^j] < — + (iV^ - 1)^ < oo, (84) 



r=0 



implying liminf E[(ij] > — oo V j, as desired. 



G. Proof of Theorem |6| 

Proof: Suppose MSL(^) is used in the t^^ time-slot, followed by MSL-SS(^) in the (t + 1)'*, . . . , (t + P- 1)*'* 
time-slots, and MSL(^) again in the {t + P)*^ time-slot. We are interested in computing the P + 1 step conditional 
expected drift in the Lyapunov function >C*(d*) of Theorem |2j given by 



5^ = E[/:*+^+i(d*+^+i) - C\d')\d' = d] 
p 

= y E[/:*+p+i(d*+p+^) - /:*+p(d*+p)id* = d] . 

fj-p 



(85) 



The key idea is to bound each term fip individually and then obtain a bound on their sum of the form (1661 ). For 
ease of exposition, we illustrate the case P = 2. The proof extends in straightforward fashion to P > 2 (with more 
algebra). 

Suppose (partial) configuration v* is selected in the r*'* time-sloU From Theorem [2j we have 

/io<2(d,Vt^)-2(d,A)+iV(l + a). (86) 
From the definition of C and d*+^ = d* + v* — x*, it follows 

m < 2(d, vj^+i) - 2(d, A) + 3iV(l + a). (87) 

'So far, we have been suppressing the time dependence of v* in all the proofs, since we were only considering one-step drifts. 
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By definition, MSL(£)-SS is used in the {t + l)*** time-slot on the subset generated by (the complete configuration 
corresponding to) v*. This can be used to show (d, vj"^^ — v*) < 2Na. It follows that 

^1 < 2(d,vt'') - 2(d,A) +3iV + 5iVa. (88) 

Next, since MSL(£) is used in the {t + 2)"^^ slot, it follows 

/i2 < ^(l-a)(d,l) +7iV + lliVa, (89) 



Finally, from Theorem |2l (d, v^) — (d, A) < -^(1 — a)(d, 1) + N. Combining the above inequalities, we get 



<5*/ < 4 (1 - a) (d, 1) + 15iV + l7Na . (90) 



The desired result now follows from arguments similar to those presented in the proof of Theorem |2] A similar 
bound can be established for any P > 2, with e = 2{P + 1)(1 — a)/N and a constant B which is an increasing 
function of P. 
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